Text retrieval from early printed books

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transfer Learning for OCRopus Model Training on Early Printed Books

A method is presented that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books when only small amounts of diplomatic transcriptions are available. This is achieved by building from already existing models during training instead of starting from scratch. To overcome the discrepancies between the set of characters of the pretra...

متن کامل

A Catalogue of Printed Books in the Wellcome Historical Medical Library. II—Books printed from 1641 to 1850 A—E

A Catalogue ofPrinted Books in the Wellcome Historical Medical Library. II-Books printed from 1641 to 1850 A-E, London, The Wellcome Historical Medical Library, 1966, pp. xi, 540, £10 10s. The second part of the Wellcome Catalogue of Printed Books, of which this volume is the first instalment, covers a period much less fully explored by bibliographers and historians than the first volume which ...

متن کامل

Printed Books to 1640 JULIAN ROBERTS

BIBLIOGRAPHY IS A TECHNIQUE PROPER to librarians. There are many outstanding exceptions to this resoundingly simple statement, but it nevertheless remains true that the librarian is the principal interpreter and beneficiary of the evidence which books, through their physical features, offer about themselves. One of the librarian's most elementary acts, that of cataloging, is bibliographical in ...

متن کامل

The Labeled Segmentation of Printed Books

We introduce the task of book structure labeling: segmenting and assigning a fixed category (such as TABLE OF CONTENTS, PREFACE, INDEX) to the document structure of printed books. We manually annotate the page-level structural categories for a large dataset totaling 294,816 pages in 1,055 books evenly sampled from 1750– 1922, and present empirical results comparing the performance of several cl...

متن کامل

Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning

We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their trans...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Document Analysis and Recognition (IJDAR)

سال: 2011

ISSN: 1433-2833,1433-2825

DOI: 10.1007/s10032-010-0146-0